Goto

Collaborating Authors

 segmentation performance






ASurprisinglySimpleApproachto GeneralizedFew-ShotSemanticSegmentation

Neural Information Processing Systems

Inthis paper,wepropose asimple yet effectivemethod for GFSS that does not use the techniques mentioned above. Also, wetheoretically show that our method perfectly maintains the segmentation performance of the base-class modelovermostofthebaseclasses. Through numerical experiments, we demonstrated the effectiveness of our method. It improved in novel-class segmentation performance in the1-shot scenario by6.1% on the PASCAL-5i dataset,4.7%on


A Supplementary Material A.1 AMWC Heuristic

Neural Information Processing Systems

Whenever a merge operation is performed the corresponding edge is contracted and new edges can potentially be created (Lines 7-17). Afterwards, the clusters belonging to non-partitionable class (i.e stuff) are merged. Table 2 contains the hyperparameters used for fully differentiable training. We can see that optimizing PQ surrogate gives better performance and using separate losses decreases the performance especially on'thing' classes. Table 5 showing that all trials improve over the baseline by fully differentiable training.


PolarMix SupplementalMaterial

Neural Information Processing Systems

Wefirst implement global augmentation approaches including random rotation and random scaling on two LiDAR scans separately and thenconcatenate themfortraining. The more copies the better segmentation performance as shown in ' 1, 2, 3' in the table, which indicates the effectiveness of the approach in enriching data distribution. In this section, we conducted experiments to analyze how PolarMix benefits LiDAR point cloud learning. As a comparison, PolarMix is more robust to the instance spatial location without much performance drop. PolarMix improves the robustness of the baseline clearly with respect to the angular variations of instances (i.e.


Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli

Neural Information Processing Systems

Humans excel at detecting and segmenting moving objects according to the {\it Gestalt} principle of "common fate". Remarkably, previous works have shown that human perception generalizes this principle in a zero-shot fashion to unseen textures or random dots. In this work, we seek to better understand the computational basis for this capability by evaluating a broad range of optical flow models and a neuroscience inspired motion energy model for zero-shot figure-ground segmentation of random dot stimuli. Specifically, we use the extensively validated motion energy model proposed by Simoncelli and Heeger in 1998 which is fitted to neural recordings in cortex area MT. We find that a cross section of 40 deep optical flow models trained on different datasets struggle to estimate motion patterns in random dot videos, resulting in poor figure-ground segmentation performance. Conversely, the neuroscience-inspired model significantly outperforms all optical flow models on this task. For a direct comparison to human perception, we conduct a psychophysical study using a shape identification task as a proxy to measure human segmentation performance. All state-of-the-art optical flow models fall short of human performance, but only the motion energy model matches human capability.


A Surprisingly Simple Approach to Generalized Few-Shot Semantic Segmentation

Neural Information Processing Systems

The goal of *generalized* few-shot semantic segmentation (GFSS) is to recognize *novel-class* objects through training with a few annotated examples and the *base-class* model that learned the knowledge about the base classes.Unlike the classic few-shot semantic segmentation, GFSS aims to classify pixels into both base and novel classes, meaning it is a more practical setting.Current GFSS methods rely on several techniques such as using combinations of customized modules, carefully designed loss functions, meta-learning, and transductive learning.However, we found that a simple rule and standard supervised learning substantially improve the GFSS performance.In this paper, we propose a simple yet effective method for GFSS that does not use the techniques mentioned above.Also, we theoretically show that our method perfectly maintains the segmentation performance of the base-class model over most of the base classes.Through numerical experiments, we demonstrated the effectiveness of our method.It improved in novel-class segmentation performance in the $1$-shot scenario by $6.1$% on the PASCAL-$5^i$ dataset, $4.7$% on the PASCAL-$10^i$ dataset, and $1.0$% on the COCO-$20^i$ dataset.Our code is publicly available at https://github.com/IBM/BCM.


Panoramic Out-of-Distribution Segmentation

Duan, Mengfei, Zhang, Yuheng, Cao, Yihong, Teng, Fei, Luo, Kai, Zhang, Jiaming, Yang, Kailun, Li, Zhiyong

arXiv.org Artificial Intelligence

Panoramic imaging enables capturing 360° images with an ultra-wide Field-of-View (FoV) for dense omnidirectional perception, which is critical to applications, such as autonomous driving and augmented reality, etc. However, current panoramic semantic segmentation methods fail to identify outliers, and pinhole Out-of-distribution Segmentation (OoS) models perform unsatisfactorily in the panoramic domain due to pixel distortions and background clutter. To address these issues, we introduce a new task, Panoramic Out-of-distribution Segmentation (PanOoS), with the aim of achieving comprehensive and safe scene understanding. Furthermore, we propose the first solution, POS, which adapts to the characteristics of panoramic images through text-guided prompt distribution learning. Specifically, POS integrates a disentanglement strategy designed to materialize the cross-domain generalization capability of CLIP. The proposed Prompt-based Restoration Attention (PRA) optimizes semantic decoding by prompt guidance and self-adaptive correction, while Bilevel Prompt Distribution Learning (BPDL) refines the manifold of per-pixel mask embeddings via semantic prototype supervision. Besides, to compensate for the scarcity of PanOoS datasets, we establish two benchmarks: DenseOoS, which features diverse outliers in complex environments, and QuadOoS, captured by a quadruped robot with a panoramic annular lens system. Extensive experiments demonstrate superior performance of POS, with AuPRC improving by 34.25% and FPR95 decreasing by 21.42% on DenseOoS, outperforming state-of-the-art pinhole-OoS methods. Moreover, POS achieves leading closed-set segmentation capabilities and advances the development of panoramic understanding. Code and datasets will be available at https://github.com/MengfeiD/PanOoS.